Method and device for executing a decrypting mechanism through calculating a standardized modular exponentiation for thwarting timing attacks

ABSTRACT

An encrypting exponentiation modulo M is effected by a modular multiplication X*YmodM, where M is a temporally steady but instance-wise non-uniform modulus. The method involves an iterative series of steps. Each step executes one or two first multiplications to produce a first result, and a trim-down reduction of the size of the first result by one or more second multiplications to produce a second result. The method furthermore takes a distinctive measure for keeping the final result of each step below a predetermined multiplicity of the modulus. In particular, the method postpones substantially any subtraction of the modulus as pertaining to the measure to a terminal phase of the modular exponentiation. This is possible through choosing in an appropriate manner one or more parameters figuring in the method. This further maintains overall temporal performance.

BACKGROUND OF THE INVENTION

The invention relates to a method according to the preamble of Claim 1. Encrypting by executing a standardized modular exponentiation is used in the environment of a smart card and elsewhere, such as for supporting financial operations, through blocking opportunity for falsifying the control or contents of such operations. Encryption can be expressed as y=<x^(e)>_(M), wherein x is a message, e is an encryption key, and M a modulus. Likewise, decryption is effected as D(y)=<y^(d)>_(M), wherein d is the decryption key, and retrieving x from D is straightforward. For a particular device, the values of M and e are known and fixed, the content of x to be encrypted is naturally unknown and variable, and the value of d is fixed but unknown. For certain operations, such as the providing of an encoded signature, the first encoding also operates with a secret key along similar lines. For the present description, such encoding is also called “decrypting”. Now, the decrypting is effected digit-wise. For each digit of D, one or two first multiplications X*Y mod M produce a first result. The attaining of such first result is followed by an addition. After attaining a second result, the next digit of D is processed. Prior technology has kept the size of the second result down by, in operation, subtracting an appropriate multiplicity (zero, one, or more) of the quantity M, because the register width of available hardware is adapted to the digit length, that is generally much less than the size of the overall quantities used in the multiplication.

It has been found that the sequential pattern of the above multiplicity may depend on the values of X, Y, and M. Further, the use of temporal statistics on a great number of mutually unrelated decryption operations with arbitrary messages allows to derive a value for d. This renders the protection by the encryption illusory. Therefore, a need exists to mask these statistical variations by some additional affecting of the calculation procedure.

SUMMARY TO THE INVENTION

In consequence, amongst other things, it is an object of the present invention to suppress the relation between the value of the decryption key and the temporal structure of the calculating steps, through a masking mechanism that does not appreciably lengthen the calculations, nor would necessitate inordinate hardware facilities. Now therefore, according to one of its aspects, the invention is characterized by the characterizing part of Claim 1. In particular, the inventors have recognized that present day microcontrollers, even those that are used in the constraining environment of a smart card, can allow the use of longer storage registers than before, and in particular, a few bits longer than the digits used in the calculation. Such registers would provide the extra freedom that the present invention is in need of.

Advantageously, the procedure executes the exponentiation along the Quisquater or Barrett prescriptions. These are methods commonly in use, and the amending of their prosecution for adhering to the invention is minimal. The pattern of the calculation procedure no longer depends on the decryption key. This takes away any method for so deciphering the value of the decrypting key.

The invention also relates to a device arranged to implement the method of the invention. Further advantageous aspects of the invention are recited in dependent Claims.

BRIEF DESCRIPTION OF THE DRAWING

These and further aspects and advantages of the invention will be discussed more in detail hereinafter with reference to the disclosure of preferred embodiments, articular with reference to the appended Figures that show:

FIG. 1, a hardware block diagram of the invention;

FIG. 2, a flow chart of the invention.

DETAILED DISCLOSURE OF PREFERRED EMBODIMENTS

The so-called “timing attack” follows the recognizing that for various implementations of the modular exponentiation, the computation times of successive encryptions will vary slightly from one message to another. With knowledge of the implementation, precise measuring of computation times for a large number of messages may lead to obtaining the secret key. Experiments have demonstrated that such is feasible indeed. To understand the nature of the attack and possible countermeasures, we give the main elements of the RSA encryption/decryption method and some common implementations.

Messages are encoded by integers x in a range 0<x<M, for some fixed number M. For integers y, z, and N, y≡z mod N indicates that N divides y−z. Also, <y>_(N) denotes the remainder of y after division by N, that is the unique number r with 0≦r<N such that y=r+q.N for some integer q. If α is a real number, then └α┘ denotes the largest integer k≦α (truncating).

The RSA scheme is based on the difficulty to factor large numbers, as follows. Two communicating parties may agree on a number M, which typically is an (m=512)-bit number, and is the product M=p.q of two prime numbers p and q that are kept secret, and each have approximately m/2 bits. The parties also agree on a private key number d and a public key number e. The numbers M and e are made public, while the number d may be put into a tamper-resistant module of a smart card that is given to the user party. The private key d must remain unknown to the user. Instructions to change the account of the user are sent in encrypted form to the smart card which then uses the private key d to decrypt the instructions to amend the account. The smart card is considered “hacked” if a user obtains the private key d, and might so for instance instruct the card to increase the account. The numbers d and e must satisfy

d.e≡1 mod lcm(p−1,q−1),

where lcm(a,b) is the least common multiple of a and b, the smallest positive integer divisible by both a and b. Given a modulus N and an integer c for which gcd(c,N)=1, that is, with c and N relatively prime (without a common divisor), it is easy to compute a number c′ such that c.c′≡1 mod N. To transfer a secret message x, 0<x<M, to a user, the number E(x)=<x^(e)>_(M) is sent instead. From an encoded message y, the card computes D(y)=<y^(d)>_(M). Note that if y=<x^(e)>_(M), then D(y)≡(x^(e))^(d)≡x^(d.e)≡x mod M. This equality follows from the fact that y^(f)≡y mod M holds for all y if and only if f≡1 mod lcm(p−1,q−1).

The security of the RSA scheme depends on the difficulty to recover <x>_(M) from <x^(e)>_(M) without knowing the private key d. For arbitrary x this problem appears as difficult as inverting e modulo lcm(p−1,q−1), i.e. finding d without knowing p and q, which is as difficult as factoring M.

Modular Exponentiation

The main operation in an RSA scheme is modular exponentiation, y→x=<y^(d)>_(M). Often, this operation is implemented as follows. Write ${d = {\sum\limits_{i = 0}^{m - 1}{d_{i}2^{i}}}},$

where d_(i)ε{0,1}, the binary representation of d. Put x^((m))=1, and compute x^((m−1)), x^((m−2)), . . . , x⁽¹⁾, x⁽⁰⁾ recursively as

x^((k))=<<(x^((k+1)))²>_(M)·y ^(d) ^(k)>_(M).  (1)

We now have x=x⁽⁰⁾. The original message x is computed from the received encrypted message y in m steps, each step consisting of a squaring modulo M followed by a multiplication modulo M if the corresponding key-bit is 1. From (1) we see that exponentiation is done through repeated modular multiplications, that is, the operation

(x,y)→z=<x.y>_(M).  (2)

Most systems simplify this multiplication by grouping m bits of the number y into digits of b bits, wherein b may arbitrarily range from b=1 to b=32. Thus, y is written as ${y = {\sum\limits_{i = 0}^{{\lfloor{n/b}\rfloor} - 1}{y_{i}2^{bi}}}},$

where 0≦y_(i)<2^(b). Formula (2) is calculated recursively by putting z_(└n/b┘)=0, and computing Z_(└n/b┘), . . . ,z₀ successively by

z_(i)=<<x.y_(i)>_(M)+z_(i+1)2^(b)>_(M).  (3)

It is not attractive to implement the operation

(x,y_(i))→<x.y_(i)>_(M)  (4)

“as is”, for the following reason. The number u=x.y_(i) is an (m+b)-bit number. The number <u>_(M) is obtained as $\begin{matrix} {(u)_{M} = {u - {\left\lfloor \frac{u}{M} \right\rfloor \cdot {M.}}}} & (5) \end{matrix}$

This computation needs a multiplication and a division of u by the m-bit number M. However, normal division of large numbers is much more complex than multiplication. Therefore, various methods have replaced the direct implementation of (5) by implementations that use a few (typically, two or three) multiplications, possibly followed by a few (typically, one or two) subtractions of M.

Several methods use special representations for numbers modulo M. This needs converting from ordinary representation to special representation and back. The converting is done only once at the start and once at the end of a modular exponentiation. In between, many modular multiplications are computed, so the extra overhead is negligible. We will give two such methods in detail.

The additional subtractions that sometimes must follow a modular multiplication make timing attacks feasible. In such attack, the smart card must decrypt a large number of messages, and a statistical analysis of the decryption times enables an attacker to recover the bits of the private key d. The invention adapts known methods for modular multiplication so that these extra subtractions are no longer needed.

Two Methods for Modular Multiplication

In the Quisquater method, all reduction is done modulo some multiple N of M, where the first p most significant bits of N are all equal to 1, that is, the modulus N is an n-bit number for which

2^(n)−2^(n−p)≦N<2^(n)  (7)

At the end of the exponentiation modulo N, the result is reduced modulo M to get the desired answer. To compute a modular multiplication

(x,y)→z=<x.y>_(N),  (8)

the n-bit number y is partitioned into blocks of b≦p−1 bits and z is found recursively by multiplying x.y_(i) similarly to (3). Expression (5) is replaced by the “Quisquater-reduction”: $\begin{matrix} {{Q(u)} = {u - {\left\lfloor \frac{u}{2^{n}} \right\rfloor \cdot {N.}}}} & (9) \end{matrix}$

Remark 1: Note that Q(u)≡<u>_(N) mod N. Only, we cannot guarantee that Q(u)<N. For large u the number Q(u) can indeed be bigger than N. However, we can show that Q(u)<θN if u<2^(p)(θ−1)N, so the Quisquater-reduction is “almost as good” as the required residue-operation.

Now, the result z of the multiplication is computed recursively by putting Z_(└n/b┘)*=0 and

Z_(i)*=Q(x.y_(i)+z_(i+1)*2^(b)  (10)

Note that z*:=Z₀*≡x.y mod N holds. We may show that for b≦p−1 and 0≦x<N, we have 0<Z_(i)*<3N for all i. So the result z=<x.y>_(N) is obtained from z* by subtracting N at most twice. We will prove this later.

The Barrett method uses the given modulus M itself. The modular reduction <x>_(M) of a number x is estimated by $\begin{matrix} {{{B(x)} = {x - {\left\lfloor {\left\lfloor \frac{x}{b^{n - 1}} \right\rfloor \cdot \frac{R}{b^{n + 1}}} \right\rfloor \cdot M}}},} & (11) \end{matrix}$

where $R = \left\lfloor \frac{b^{2n}}{M} \right\rfloor$

and n is chosen such that b^(n−1)≦M<b^(n). The product z=xy of two numbers x and y is calculated as follows:

(i) z₀=xy;

(ii) z=z₀*=B(z₀).

We will prove that if 0≦x,y<M, the result z obeys 0≦z<3M, so that we need at most two extra reductions to obtain <xy>_(M). For b≧3 the computation of B(z₀) may be simplified. Let indeed $u = {\left\lfloor {\left\lfloor \frac{x}{b^{n - 1}} \right\rfloor \cdot \frac{R}{b^{n + 1}}} \right\rfloor \cdot M}$

so that z=x−u. By the above remark, z<3M<b^(n+1). From z≡x−u mod b^(n+1), we conclude that $z = {{\langle z\rangle}_{b}^{n + 1} = \left\{ \begin{matrix} {{{\langle x\rangle}_{b^{n + 1}} - {\langle u\rangle}_{b^{n + 1}}},} & {{{if}\quad {this}\quad {expression}\quad {is}} \geq 0} \\ {b^{n + 1} + {\langle x\rangle}_{b^{n + 1}} - {\langle u\rangle}_{b^{n + 1}}} & {{otherwise}.} \end{matrix} \right.}$

Improvements of the Algorithms for Modular Multiplication

Timing attacks are feasible because the modular multiplication may or may not require additional subtractions of the modulus. However, these additional reductions may be avoided: by a slight change of the original assumptions, we can work throughout the modular exponentiation with the unreduced results, and do any reduction only at the very end of the modular exponentiation. To show that this works, we will provide upper bounds on the intermediate results of our modular multiplications for each algorithm.

For the modified Quisquater method we assume that we have a modulus M for which 2^(m−1)≦M<2^(m). Now, we compute a number N=cM for which holds 2^(n)−2^(n−p)≦N<2^(n). This is always possible if n≧m+p. Because the admissible interval for N is 2^(n−p)≧2^(m)>M, some multiple of M must fall within this interval. All intermediate computations are done modulo N, instead of modulo M, with a reduction modulo M at the very end. N is a multiple of M, so no information is lost.

As before, to get the result z=xy of a modular multiplication of x and ${y = {\sum\limits_{i = 0}^{{\lfloor{n/b}\rfloor} - 1}{y_{i}2^{b_{i}}}}},$

we use the following:

(i) Z_(└n/b┘)=0;

(ii) For i=└n/b┘−1, └n/b┘−2, . . . ,0, we compute

z_(i)=x.y_(i)+Z_(i+1)*2^(b) and Z_(i)*=Q(z_(i));

(iii) z=z₀*.

We will need the following facts about this algorithm.

Proposition 3.3. If 0≦x<αN and 2^(p)+α2^(b)≦(2^(p)−2^(b))θ, then for all i

0≦z_(i)*<θN  (12).

Proof: If 0<x<αN, N≦2^(n)−2^(n−p) , 0≦y _(i)<2^(b), and 0≦Z_(i+1)*<θM, then

z_(i)<αN2^(b)+θN2^(b)=(α+θ)N2^(b),

hence ${\left( {{Q\left( z_{i} \right)} - {\langle z_{i}\rangle}_{N}} \right)/N} = {{\left\lfloor \frac{z_{i}}{N} \right\rfloor - \left\lfloor \frac{z_{i}}{2^{n}} \right\rfloor} < {1 + \frac{z_{i}}{N} - \frac{zi}{2n}}}$

=1+z_(i)(2^(n)−N)/N2^(n)

<1+(α+θ)N2^(b)2^(n−p)/N2^(n)

=1+(α+θ)2^(b−p)≦θ.

Now Q(z_(i))≡z_(i)≡<z_(i)>_(N), hence the number (Q(z_(i))−<z_(i)>_(N))/N is an integer; since according to the above this number is <θ, we conclude that (Q(z_(i))−<z_(i)>_(N))/N≦θ−1. Since also <z_(i)>_(N)<N by definition, it follows that: z_(i)*=Q(z_(i))<θN.

Obviously, condition (12) can only be satisfied if p≧b+1. Further, if p=b+1, then (12) is equivalent to the condition θ≧α+2. So if α=1, then the number z resulting from the above algorithm always satisfies 0≦z≦3N, so that at most two further reductions are required, a result used hereabove.

If all results of modular multiplications must be less than αN, we need θ=α, and it is necessary and sufficient that

p≧b+2, θ=α≧2.

So, provided that p=b+2, during modular exponentiation we may forego additional reductions after each modular multiplication and still guarantee that all results are non-negative and ≦2N. The result of the modular exponentiation is obtained by at most one reduction at the very end, plus a reduction Z→<z >_(M).

The Barrett method can be modified in a way similar to Quisquater's. Assume that modulus M obeys b^(n−1)≦M<b^(n). To compute the result z=xy of a modular multiplication of x and $y = {\sum\limits_{i = 0}^{{\lfloor{n/r}\rfloor} - 1}{y_{i}b^{ri}}}$

in b-ary notation, we use the following. Define ${B_{k,l}(u)} = {u - {\left\lfloor {\left\lfloor \frac{u}{b^{k}} \right\rfloor \cdot {\left\lfloor \frac{b^{k + 1}}{M} \right\rfloor/b^{1}}} \right\rfloor {M.}}}$

(i) z^(└n/r┘)*=0

(ii) For i=└n/r┘−1, . . . , 0, we compute

z_(i)=x.y_(i)+Z_(i+1)*·b^(r)

 and

z_(i)*=B_(k,l)(z_(i));

(iii) z=z₀*.

Proposition 3.4: If 0≦x<αM, 0≦Z_(i+1)≦βM, and

k=n−1, l=r+1, 2+α+β≦θ,  (13)

or

k=n, l=r+1, α=β=θ, θ≧max (2b/(b−2),(1+b)b²/(b²−2)),  (14)

or, for example,

b=4, k=n−1, l=r+2, α=β=θ, θ≧3,  (15)

or

b=4, k=n−2, l=r+4, α=βθ, θ≧2,  (16)

then 0≦z_(i)*<θM.

Proof: If 0≦x<αM, b^(n−1)≦M<b^(n), 0≦y_(i)<^(r), and 0≦Z_(i+1)*≦βM, then

z_(i)<αMb^(r)+βMb^(r)=(α+β)Mb^(r),

hence $\begin{matrix} {{\left( {{B\left( z_{i} \right)} - {\langle z_{i}\rangle}_{M}} \right)/M} = \quad {{\left\lfloor \frac{z_{i}}{M} \right\rfloor - \left\lfloor {\left\lfloor \frac{z_{i}}{b^{k}} \right\rfloor {\left\lfloor \frac{b^{k + l}}{M} \right\rfloor/b^{l}}} \right\rfloor} < {1 +}}} \\ {\quad {{\frac{z_{i}}{M} - {\left\lfloor \frac{z_{i}}{b^{k}} \right\rfloor {\left\lfloor \frac{b^{k + l}}{M} \right\rfloor/b^{l}}}} < {1 +}}} \\ {\quad {\frac{z_{i}}{M} - {\left( {\frac{z_{i}}{b^{k}} - 1} \right){\left( {\frac{b^{k + 1}}{M} - 1} \right)/b^{l}}}}} \\ {= \quad {{1 + \frac{b_{k}}{M} + \frac{z_{i}}{b^{k + 1}} - \frac{1}{b^{l}}} < {1 +}}} \\ {\quad {\frac{b_{k}}{M} + \frac{\left( {\alpha + \beta} \right)/{Mb}^{r}}{b^{k + l}}}} \end{matrix}$

 ≦1+max(b^(k−n+1)+(α+β)b^(n−1+r−k−1), b^(k−n)+(α+β)b^(n+r−k−1)).

The last inequality follows from b^(n−1)≦M <b^(n), and from the fact that the convex function a/M+bM, a≧0 becomes maximum when M is either minimal or maximal. An interesting result needs k+1≧n+r. To limit the size of the necessary multiplication to compute B_(k,l), k should be as large and k+l as small as possible. Each additional condition (13), (14), (15), and (16) then implies this last expression to be at most equal to θ.

Now B(z_(i))≡z_(i)≡<z_(i)>M mod M, hence (B(z_(i))−<z_(i)>_(N))/N is an integer; since this number is less than θ, it follows that (B(z_(i))−<z_(i)>_(M))/M≦θ-1. Since also <Z_(i)>_(M)<M by definition, we conclude that z_(i)*=B(z_(i))<θM.

The above is used in several ways. One way takes r=n, k=n−1, and l=n+1. Then we compute z=B_(k,l)(xy)=B(xy) in one step, so θ=0. Proposition 3.4 states that if x<M, then z<3M. This proves the earlier claim.

Alternatively, b is taken small, typically b=4, and k=n, l=r+1, and α=β=θ. Since (1+b)b²/(b²−2)≧2b/(b−2) for b≧3, the result in Proposition 3.4 states that all intermediate results will be <θM if θ≧(1+b)b²/(b²−2) and b≧3, that is, <6M when b=4. Similarly, for b=4, if we let k=n−2, l=r+4, α=β=θ=2, then all intermediate results will be <2M and if we let k=n−1, l=r+2, α=β=θ=3, then all intermediate results will be <3M. Therefore, the modular exponentiation may forego additional reductions after each modular multiplication and still guarantee that all intermediate results are non-negative of size at most θM. Here θ is a number between 2 to 6, depending on the choice for k and l. The final result is obtained via only a few reductions at the very end of the modular exponentiation.

By itself, a well-known third method for implementing modular exponentiation is due to Montgomery. An improvement to this particular method has been disclosed in M. Shand & J. Vuillemin, Fast Implementation of RSA Cryptography in Proc. 11th Symposion on Computer Arithmetic, IEEE 1993, p.252-259. In this reference, certain normalizing proposals to the Montgomery method are done to obviate the need for repeated normalizations after intermediate processing steps. The object of the reference was to increase the overall speed of the processing. The present invention on the other hand, has shown that timing attacks may be thwarted by initial operand conversions plus some minimal hardware facilities for non-Montgomery algorithms. Such timing attacks have not figured in the setting of the above citation. Furthermore, the Quisquater and Barrett methodologies have been herein disclosed expressly by way of non-limiting embodiments only.

FIG. 1 is a hardware block diagram of a device according to the invention. The operand memory 20 is as shown based on the modular storage of 8-bit digits. Address sequencer 22 successively addresses the various digit locations for reading and writing, as the case may be. Processing element 24 and address sequencer 22 operate in mutual synchronism through interconnection 21. Processing element 24 has an input register 26 for a first digit that may be received as read from memory 20. Furthermore, it has an input register 30 for a second digit through retrocoupling from its result register 28. The latter has an enlarged length with respect to the digit length. A selecting register 32 allows digit-based retrostorage into memory 20. The processing element may execute normalizing, preprocessing and postprocessing as described earlier, and further the standard modular multiplication of the Quisquater, Barrett, and similar non-Montgomery methods. The particular operations are governed through control register 30.

FIG. 2 is a flow chart of the invention. In block 50, the operation is started, which may need claiming of various hardware and software facilities. In block 52, the encrypted message is received. In block 54, the message is preprocessed in the manner described for any applicable algorithm. In block 56, one turn of the inner loop is executed, that calculates an intermediate result on the basis of two b-ary digits. In block 58, the system detects whether the inner loop in question has been executed a sufficient number of times (ready?). If no, the system reverts to block 56. If yes, the system proceeds to block 60 and executes one turn of the outer loop. Subsequently, in block 62, the system detects whether the outer inner loop in question has been executed a sufficient number of times (ready?). If no, the system reverts to block 56 for further executing the inner loop. If yes, the system proceeds to block 64 for postprocessing the final results, and subsequently to block 66 for outputting the result to a user, such as the central processing facility of the smart card in question. The combination of FIGS. 1, 2, in combination with the extensive further disclosure, is deemed to give the skilled are practitioner sufficient teachings as to how to implement the invention.

Summary of the Methods

Input: {overscore (x)}, d, {overscore (M)}, 0≦{overscore (x)}, d, {overscore (M)}<α^({overscore (n)}). (Typically, α=2). {overscore (x)} is the encrypted message.

Output: {overscore (z)}=<{overscore (x)}^(d)>_({overscore (M)}). First, the standard methods are reviewed.

a. Preprocessing

Quisquater: x={overscore (x)},n={overscore (n)}+p, M=c{overscore (M)}, where p is some integer and c is chosen such that α^(n)−α^(n−p)≦N<α^(n). This produces a unique choice for c. Also, α=2.

Barrett: n={overscore (n)}, M={overscore (M)}x={overscore (x)}.

b. Partitioning of d

Write $d = {\sum\limits_{i = 0}^{m - 1}{d_{i}{\beta^{i}.}}}$

 Typically, β=2 and m=#bits of d.

c. Outer loop

z←1

repeat for i=m−1→0:

z←Mult(z, z; M) (if β=2; in general z←<z^(β)>_(M).)

z←Mult(z, x ^(d) ^(i); M) (only needed if d_(i)>0).

endrepeat

d. Implementation of modular multiplication in operation Mult

The implementation of z←Mult(u, v; M) assumes 0≦u, v<M, and the result z satisfies 0≦z≦M.

e. Partitioning

Write ${v = {\sum\limits_{i = 0}^{n^{\prime} - 1}{v_{i}B^{i}}}},$

where B=α^(b) for some integer b. In other words, the n α-ary digits of v are grouped in n′ blocks of b digits each. (So n=n′b.)

Moreover, Quisquater assumes that b≦p−1; Barrett takes b=n, n′=1.

f. Inner Loop

z←0

repeat for i:

h←z. F+u. v_(i)

z←R(h)

endrepeat

while z≧M do z←z−M. (17)

Here, we have the following.

1. i=n′−1→0 in Quisquater and Barrett,

2. F={B=α^(b) (Quisquater and Barrett)

3. ${R(h)} = \left\{ \begin{matrix} {{{Q(h)}:={h - {\left\lfloor \frac{h}{\alpha^{n}} \right\rfloor \cdot M}}},} & {({Quisquater}),} \\ {{{B(h)}:={{B_{{n - 1},{n + 1}}(h)} = {h - {\left\lfloor {\left\lfloor \frac{h}{\alpha^{n - 1}} \right\rfloor \cdot {\left\lfloor \frac{\alpha^{2n}}{M} \right\rfloor/\alpha^{n + 1}}} \right\rfloor \cdot M}}}},} & ({Barrett}) \end{matrix} \right.$

The Barrett reduction operation B used here is a special case of the general Barrett reduction ${B_{k,l}(h)}:={h - {\left\lfloor {\left\lfloor \frac{h}{\alpha^{k}} \right\rfloor \cdot {\left\lfloor \frac{\alpha^{k + l}}{M} \right\rfloor/\alpha^{l}}} \right\rfloor \cdot M}}$

4. In Quisquater and Barrett: 0≦w<3M in all steps.

9. Postprocessing

Here z satisfies 0≦z<M.

Quisquater: {overscore (z)}←<z>_({overscore (M)}).

Barrett: {overscore (z)}←z.

RESUME OF CONDITIONS AND PROPERTIES IN “OLD” ALGORITHMS

Ouisquater

p≧b+1

Barrett

New Methods

a. Preprocessing

As before, except that for Montgomery we now require more, e.g. that M <R/4. (So n={overscore (n)}+2 if α=2).

b. Partitioning of d

As before.

c. Outer loop

As before, but the operation Mult is implemented slightly differently.

d/e. Implementation of modular multiplication in operation Mult Partitioning

As before, but for Quisquater we now require that b≦p−2.

f. Inner Loop

As before except that the last instruction “while z≧M do z←z−M” is removed.

Also, for Barret, instead of taking b=n, n′=1 we allow other values of b, and instead of the special case B=B_(n−1,n+1) we now take B=B_(k,l) for other values of k and l. (For example k=n, l=b+1, or we choose α=4 and take e.g. k=n−1, l=b+2 or k=n−2, l=b+4).

g. Postprocessing

Now we can only guarantee that {overscore (z)}<θM for some θ, typically θ=2 or θ=3. So in case of Barrett, the while statement (17) which was removed from the Loop-part of the operation Mult must now occur here: while z≧M do z←z−M. For Quisquater, this same operation is also required, but can possibly be combined with the other postprocessing (which does not change).

Resume of Conditions and Properties in “New” Algorithms

Ouisquater

p≧b+2 (instead of b+1)

Barrett

Various possible values for the numbers k,l of the Barrett reduction operator B_(k,l). A good condition may be found as follows.

To compute x.y mod M by Barrett:

If y=(Y₀y₁ . . . Y_(└n/r┘−1)) each y_(i)<b^(r) (b-ary digits) and given 0≦x <αM, 0≦z_(i+1),*≦βM, we have (z_(i)*−<z_(i)>_(M))/M =(B_(k,l)−<z_(i)>_(M))/M≦1+max (b^(k−n+1)+(α+β)b^(n−1+r−k−l), b^(k−n)+(α+β)b^(n+r−k−l)) which we need to be <θ. Certainly needed k+l≧n+R Classical: R=n, k=n−1, l=n+1, then β=0, α=1→z*<3M (θ=3).

New: R<n, α=β=θ. The condition is 1+max (b^(k−n+1)+2θB^(n−1+r−k−l), B^(k−n)+2θB^(n+r−k−l))<θ to have all results <θM.

Better condition: ${1 + \frac{b^{k}}{M} + \frac{2\theta \quad {Mb}^{r}}{b^{k + l}}} < \theta$

for all allowable M. ${{Example}\quad b} = \left. 2\rightarrow{{1 + \frac{2^{k}}{M} + \frac{2\theta \quad {M2}^{r}}{2^{k + l}}} < \theta} \right.$

2^(n−1)≦M <2^(n)→(2^(2−n)+2θ2^(n+r−k−l), 2^(k−n+1)+2θ2^(n−1+r−k−l))<θ.

Remark: In both algorithms, these new conditions translate in terms of hardware into the use of slightly larger registers to store various intermediate variables. 

What is claimed is:
 1. A method for executing a decrypting modular exponentiation modulo M, by digit-wise calculating modular and looped multiplications X*Y mod M, according to a non-Montgomery procedure, with M a temporally steady but instance-wise non-uniform modulus, said method involving an iterative series of steps organized in a loop hierarchy wherein one or two first multiplications are executed to produce a first result, and the loop hierarchy is associated to a reduction of a size of the first result by one or more second multiplications to produce a second result, said method furthermore taking a measure for keeping a final result of such step below a predetermined multiplicity of said modulus, said method postpones a subtraction of the modulus, if needed, as pertaining to said measure to a terminal phase of the modular exponentiation, as being conditional to choosing one or more preprocessing parameters figuring in said method whilst maintaining overall temporal performance of said method.
 2. A method as claimed in claim 1, for executing the exponentiation along the Quisquater prescription, whilst choosing the value of the integer p=n−n′ not less than p≧b+2.
 3. A method as claimed in claim 1, for executing the exponentiation along the Barrett prescription, whilst choosing the value of the numbers k, l, in a suitable manner according to: 1+(b^(k)/M)+(2θMb^(R)/b^((k+l)))<θ.
 4. A device arranged for executing the method as claimed in claim
 1. 5. A device as claimed in claim 4, and having enhanced register width for therein storing intermediate results of the exponentiation. 